[dylink] Fix deadlock in _emscripten_dlsync_threads()#27000
Conversation
| DBG("waking main runtime thread using _emscripten_thread_notify"); | ||
| if (ret) { | ||
| // Wake up the target thread in case it is blocked in | ||
| // `emscripten_futex_wait`. |
There was a problem hiding this comment.
This is little confusing (and maybe wasteful) though since normal pthread do not run their proxy queue in emscripten_yield, and therefore wakeing them them up should not be needed.
pthreads do however call _emscripten_process_dlopen_queue from emscripten_yield .. so it should only be necessary to wake them up if/when need need to call _emscripten_process_dlopen_queue, and not whereever work gets proxied to them.
There was a problem hiding this comment.
I'm not sure if _emscripten_thread_notify() is actually that expensive, since it already returns early when the thread isn't waiting.
I tried limiting this notify to the dlopen() proxying queue with commit 72cb72a, but reintroducing && pthread_equal(target_thread, emscripten_main_runtime_thread_id()) regressed this again.
There was a problem hiding this comment.
pthreads do however call _emscripten_process_dlopen_queue from
emscripten_yield[...]
I think that's probably why reintroducing && pthread_equal(target_thread, emscripten_main_runtime_thread_id()) caused this regression again, as the queue is processed from pthreads rather than from the main runtime thread.
There was a problem hiding this comment.
I'm not really worried about the code of doing the _emscripten_thread_notify. But I would really like to know why it is needed. Because pthreads don't run their messge queue on wakeup I don't see any reason to wake them when we put something in the queue.
pthreads should only need to wakeup if they are need to process the dlopen queue, right?
There was a problem hiding this comment.
pthreads should only need to wakeup if they are need to process the dlopen queue, right?
Correct, the latest revision of this PR already restricts wakeups to the dlopen queue.
This extends commit 662cb06 by ensuring `_emscripten_thread_notify()` is also called for the synchronous `_emscripten_dlsync_threads()` path. This ensures that threads process the dlopen catch-up queue promptly, even when blocked in `emscripten_futex_wait()`. To implement this cleanly, `dlopen_proxying_queue` is moved from a dynamic pointer in `dynlink.c` to a statically initialized queue in `proxying.c`. Fixes: emscripten-core#26913.
This is an automatic change generated by tools/maint/rebaseline_tests.py. The following (2) test expectation files were updated by running the tests with `--rebaseline`: ``` codesize/test_codesize_minimal_pthreads.json: 26180 => 26179 [-1 bytes / -0.00%] codesize/test_codesize_minimal_pthreads_memgrowth.json: 26589 => 26588 [-1 bytes / -0.00%] Average change: -0.00% (-0.00% - -0.00%) ```
72cb72a to
d7be9f4
Compare
_emscripten_dlsync_threads() path_emscripten_dlsync_threads()
This is an automatic change generated by tools/maint/rebaseline_tests.py. The following (2) test expectation files were updated by running the tests with `--rebaseline`: ``` codesize/test_codesize_minimal_pthreads.json: 26158 => 26157 [-1 bytes / -0.00%] codesize/test_codesize_minimal_pthreads_memgrowth.json: 26567 => 26566 [-1 bytes / -0.00%] Average change: -0.00% (-0.00% - -0.00%) ```
|
I'd really like to get this fixed for the 6.0.0 release, but I'd also love to understand the issue better rather then just always waking pthreads when they we add a message to their queue (which still seem like it should not be needed in theory). |
This is an automatic change generated by tools/maint/rebaseline_tests.py. The following (2) test expectation files were updated by running the tests with `--rebaseline`: ``` codesize/test_codesize_minimal_pthreads.json: 26147 => 26146 [-1 bytes / -0.00%] codesize/test_codesize_minimal_pthreads_memgrowth.json: 26556 => 26555 [-1 bytes / -0.00%] Average change: -0.00% (-0.00% - -0.00%) ```
The latest revision of this PR already limits the |
| self.set_setting('EXIT_RUNTIME') | ||
| self.set_setting('PROXY_TO_PTHREAD') | ||
| # Uncomment to test _emscripten_proxy_dlsync_async() | ||
| # self.set_setting('PROXY_TO_PTHREAD') |
There was a problem hiding this comment.
Can we run this in both modes using @parameterized ?
sbc100
left a comment
There was a problem hiding this comment.
Nice! Thanks for all the work on this.
This solution looks correct to me. I'm a little sad that the core proxying mechanism needs to be aware of the dynlink queue, but I'm not sure I see any way around it.
| // `_emscripten_proxy_dlsync` below, and processed by background threads | ||
| // that call `_emscripten_process_dlopen_queue` during futex_wait (i.e. whenever | ||
| // they block). | ||
| static em_proxying_queue * _Atomic dlopen_proxying_queue = NULL; |
There was a problem hiding this comment.
Could we leave the declaration of this queue here in this file? (and declare a getter for it?)
In fact, could we just declare the queue itself as extern in proxying.c (avoiding the getting completely).
There was a problem hiding this comment.
Revised this to use an extern declaration with commit 4ef62da.
There was a problem hiding this comment.
Hmm, this seems to regress code size, see e.g. commit 4d7842b.
There was a problem hiding this comment.
You mean moving back to this file causes a regression? I think co-locating it here does make the most sense unless its a big regression, which would seem very odd.
There was a problem hiding this comment.
Never mind, the code size expectations for the main branch need to be rebaselined.
https://github.com/emscripten-core/emscripten/actions/runs/26584327538/job/78325943846?pr=27000
|
|
||
| // Proxying via the dlopen or system queue may target a thread that is | ||
| // currently blocked in `emscripten_futex_wait`, so explicitly wake it | ||
| // after enqueueing the task. |
There was a problem hiding this comment.
Could you leave the original sentence in place and then add a second one: "In addition, we have a special case here for the dlopen_proxying_queue...". Maybe with a TODO do find way to avoid the need for this special case?
This is an automatic change generated by tools/maint/rebaseline_tests.py. The following (2) test expectation files were updated by running the tests with `--rebaseline`: ``` codesize/test_codesize_minimal_pthreads.json: 26146 => 26179 [+33 bytes / +0.13%] codesize/test_codesize_minimal_pthreads_memgrowth.json: 26555 => 26588 [+33 bytes / +0.12%] Average change: +0.13% (+0.12% - +0.13%) ```
sbc100
left a comment
There was a problem hiding this comment.
LGTM, but can you move the declaration of the queue back to system/lib/libc/dynlink.c?
Unfortunately, this won't work because |
Hmm. thats annoying. I guess we would need some kind of way to statically initialize a proxy queue. Its not great that the dynlink system leaks into the core prozying system like this though, so I'm not sure the advantage of the static initializer outweigh the disadvantage of the separation of concerns here. I'm hoping to remove all reference to dlopen queue from the core proxying.c (as a followup perhaps) so maybe just keeping this as lazily initialized in dynlink.c is more condusive to that future? |
|
Maybe you can use pthread_once to initialize it in its getter function? |
|
Or maybe C11 call_once? |
|
Actually, we can trivially move it back to |
Since accessing a shared variable from multiple threads without synchronization (where at least one access is a write) is always a data race and causes UB.
|
Sorry the codesize tests just got rebased. Can you rebase one more time? |
|
Just to confirm, with the changes to test_pthread_dlopen, it will now fail without this fix? Is it just one variant of it in particular requires this fix? |
I can confirm that
Yup, this fix was only required for the synchronous path in |
This extends commit 662cb06 by ensuring
_emscripten_thread_notify()is also called for the synchronous
_emscripten_dlsync_threads()path. This ensures that threads process the dlopen catch-up queue
promptly, even when blocked in
emscripten_futex_wait().Fixes: #26913.